Identifying Prediction Mistakes in Observational Data

The Quarterly Journal of Economics, August 2024

Ashesh Rambachan

Presenter: Joe Paul

Motivation

  • Expert decision-makers make consequential decisions based on predictions of an unknown outcome
  • Common Situation in Economics:
    • Judges deciding pretrial release.
      • Judges must predict whether a defendant will fail to appear in court
    • Doctors making diagnoses
      • A prediction if a patient suffers some undiagnosed condition
    • Managers making hiring decisions
      • Which applicants should be hired based on predictions of their future productivity

Empirical Example

For a particular judge, we can observe their release rate at each predicted risk decile. We can also observe the failure to appear rate. Question: Are these patterns of pretrial release decisions rationalisable as maximising expected utility at accurate predictions of failure to appear risk given some private information and a utility function that can vary across the four cells?

Key Questions

Do experts make systematic prediction mistakes based on the available information?

If so: - Which decision makers? - On which decisions? - In which ways are predictions systemically biased?

Core Challenges

Three key identification challenges make this a very challenging econometric problem: 1. Decision-makers observe private information relevant to predicting the outcome 2. Unknown preferences that may vary across decisions 3. Missing data - outcomes only selectively observed for some of the decision maker’s choices

How can we tell if a decision maker’s choices reflect systematic prediction mistakes vs optimal behaviour at unknown preferences and information sets?

Existing Research

Strong assumptions or tailored structural models for each empirical setting: - Restrict preferences to be fixed across decisions and DMs - Observed choices as good as randomly assigned (Assume away private information) - Parametric models of private information

This Paper

A unifying framework to analyse systematic prediction mistakes under weak assumptions on preferences and information sets in general observational settings.

Overview

  • Analyse through the lens of subjective expected utility maximisation
  • Test if choices are consistent with accurate beliefs.
  • Characterise magnitude and nature of systematic mistakes
  • Apply to pretrial release decisions in NYC

Model Setup: Characteristics

Single Decision Maker’s Information:

  • Each decision is summarised by some characteristics \(X\) that is observed by the decision-maker and researcher
    • Pretrial Release example: Charge information, arrest record, etc
  • For each decision, we observe a binary choice \(C \in \{ 0, 1 \}\)
  • Each decision is associated with some latent outcome \(Y^*\) that is unknown to the decision maker at the time of their choice.

Missing data problem

We only observe the latent outcome if \(C=1\) \[Y := C \times Y^*\] * We observe \(P(Y^* \mid C=1, X )\) not \(P(Y^* | C=0 , X) \implies P(Y^* \mid X)\) partially identified. * Example: Only observe fail to appear if the judge grants bail. Failure to appear among all defendants only identified up to a set.

Observable Data

  • We observe dataset \((X, C, Y) \sim_{i.i.d} \mathcal{P}\) of each decision made by a DM. For each decision we observe
    • Characteristics \(X\)
    • DM’s Choice \(C\)
    • Selectively observed outcome \(Y = C \cdot Y^*\)

Identification

  • Analyse DM’s choices through a nonparametric expected utility maximisation problem, given:
    • Subjective beliefs about outcome given characteristics (Unknown)
    • Utility function (Unknown)
    • Private information (Unknown)
  • Given all the unknown factors can we answer the question: Are DM’s subjective beliefs about the outcome given characteristics accurate? i.e. Lie in the identified set for \(P(Y^* |X)\)
  • If not, conclude MS’s choices reflect systematic prediction mistakes.

Identification

  • Analyse DM’s choices through a nonparametric expected utility maximisation problem, given:
    • Subjective beliefs about outcome given characteristics (Unknown)
    • Utility function (Unknown)
    • Private information (Unknown)
  • How do we draw inferences about the DM’s subject beliefs about \(Y^* \mid X\) using \((X, C, Y) \sim \mathcal{P}\) without knowledge of DM’s Utility function and private information.

Results

  1. Identification: Are DM’s choices consistent with expected utility at accurate beliefs
    • Econometric assumptions addressing missing data problem, and behaviour assumptions on the DMs utility function

Results

  1. Identification: Are DM’s choices consistent with expected utility at accurate beliefs
    • Econometric assumptions addressing missing data problem, and behaviour assumptions on the DMs utility function
  2. Characterisation: In what ways are the DM’s predictions systematically biased?
    • Bound how common and costly are systematic prediction mistakes
    • Bound whether DM’s subjective beliefs overreact or under-react to characteristics
    • Quasi-experimental variation \(+\) out-of-sample prediction \(\to\) confidence intervals

Information and Beliefs

Information and Beliefs

- No assumptions are placed on the distribution of \(V\) and we will model non-parametrically

Information and Beliefs

Information and Beliefs

\[ \underbrace{Q(Y^* \mid V, X)}_{\text{Posterior}} \propto \underbrace{Q(V \mid Y^*, X)}_{\text{Likelihood}} \underbrace{Q(Y^* \mid X)}_{\text{Subjective Beliefs}} \] - Goal: Draw Inferences about DM’s subject beliefs \(Q(Y^* \mid X)\) using \((X, C, Y) \sim \mathcal{P}\), using data on their choices which reflect their posterior, leaving the private information and likelihood unspecified.

Utility function

  • Suppose research partitions \(X = (X_{0}, X_{1})\)

  • \(u(c, y^*; x_{0})\): Payoff of choice \(c\) at outcome $y^* and characteristics \(x_{0}\)

  • Exclusion Restriction: Characteristics \(X_{0}\) directly effect both utility function and beliefs, whereas other characterises \(X_{1}\) and private information \(V\) only affect beliefs.

Utility function

  • Suppose research partitions \(X = (X_{0}, X_{1})\)

  • \(u(c, y^*; x_{0})\): Payoff of choice \(c\) at outcome $y^* and characteristics \(x_{0}\)

  • Exclusion Restriction: Characteristics \(X_{0}\) directly effect both utility function and beliefs, whereas other characterises \(X_{1}\) and private information \(V\) only affect beliefs.

\(\implies\) Conditional on \(X_{0}\), variation in DM’s choice probabilities across \(X_{1}\) only reflect variation in posterior beliefs \(Q(Y^* \mid V, X)\) but not variation in utility function \(u(c, y^* ; x_{0})\)

Utility function

  • Suppose research partitions \(X = (X_{0}, X_{1})\)

  • \(u(c, y^*; x_{0})\): Payoff of choice \(c\) at outcome $y^* and characteristics \(x_{0}\)

  • Exclusion Restriction: Characteristics \(X_{0}\) directly effect both utility function and beliefs, whereas other characterises \(X_{1}\) and private information \(V\) only affect beliefs. ### Example

  • Pretrial Release: Judges utility function only directly depends on

    • Defendant’s race (Discrimination)
    • Defendant’s age
    • Defendant’s charge severity

Utility Maximisation

  • Given utility function, expected utility maximisation yields set of feasible joint distributions that must satisfy the following two outcomes: $$ (X, V, C, Y^*) Q \begin{cases}
  1. C {c’} {Q}[u(c’, Y^*; X_{0}) | X, V], \
  2. C Y^* X, V. \end{cases} $$
  • We can say that a DM’s observed choices are consistent with expected utility maximisation, at accurate beliefs, if there exists a utility function and associated joint distribution \(Q\) under expected utility model s.t.: \[ \underbrace{Q(X, C, Y)}_{\text{model}} = \underbrace{P(X, C, Y)}_{{\text{data}}}, \text{ where } Y = C \cdot Y^* \] We are able to construct a utility function, accurate beliefs, private information, tie-breaking rule that would reproduce observable data (the decision maker’s choices).

Interpretation

  • The DM is making a Systematic Prediction Mistake based on \(X\) if choices are inconsistent with expected utility maximisation at accurate beliefs
  • There is no combination of utility function, accurate beliefs, private information, tie-breaking rule that would reproduce observable data (the decision maker’s choices).

Pretrial Release

  • Focus on a class linear utility functions. (Linear in outcome)
    • Costly to release defendants that would fail to appear and detail defendants that would not fail to appear
  • Are DM’s choices consistent w/ expected utility maximisation at accurate beliefs, any private information, and any linear utility function?

Characterisation

To show a DMs choices are consistent w/ expected utility max. at any accurate beliefs, linear utility, private information, it is sufficient to show that \(\forall x_{0}\) \[ \max_{\tilde{x}_{1}} \mathbb{E}\left[ \sum_{k} Y_{k}^* \mid C=1, X=(x_{0}, \tilde{x}_{1}) \right] \leq \min_{\tilde{x}_{1}} \mathbb{E}\left[ \sum_{k} Y_{k}^* \mid C=0, X=(x_{0}, \tilde{x}_{1}) \right] \] given \[ \underbrace{Y^* \mid \{ C=1, X \}}_{\text{point identified}} \text{ and } \underbrace{Y^* \mid \{ C=0, X \}}_{\text{not identified}} \] - We observe conditional distribution of \(Y^*\) for those released. We do not for those detained.

\(\implies\) DM’s choices are inconsistent w/ expected utility max. at accurate beliefs iff there exists no \(Y^* \mid \{ C=0 , X \}\) that satisfies the inequalities.

Characterisation

\[ \max_{\tilde{x}_{1}} \mathbb{E}\left[ \sum_{k} Y_{k}^* \mid C=1, X=(x_{0}, \tilde{x}_{1}) \right] \leq \min_{\tilde{x}_{1}} \mathbb{E}\left[ \sum_{k} Y_{k}^* \mid C=0, X=(x_{0}, \tilde{x}_{1}) \right] \] given \[ \underbrace{Y^* \mid \{ C=1, X \}}_{\text{point identified}} \text{ and } \underbrace{Y^* \mid \{ C=0, X \}}_{\text{not identified}} \]

  • Intuitively, over the class of linear utility functions, EU max. requires the DM to make choices according to an incomplete threshold rule on their posterior beliefs. This is showing that expected utility max. at any accurate beliefs and private information is observationally equivalent to behaviour generated by EU max. under beliefs governed by these two inframarginal conditional expectations.
  • We only need to check if we could reproduce the decision makers choices under a threshold rule on these inframaringal conditional expectations, one of which is identified, the other which is not identified, but for which we are able to get informative bounds using quasi-experimental variation.

Problem

With the assumptions so far, the DM’s choices are always consistent w/ EU max. at some linear utility function and accurate beliefs.

break

We therefore, need to make the following assumption

Econometric Assumptions to construct informative bounds on missing data

\[ \underline{\mathbb{E}} \left[ \sum_{k}Y_{k}^* \mid C=0, X \right] \leq {\mathbb{E}} \left[ \sum_{k}Y_{k}^* \mid C=0, X \right] \leq \overline{\mathbb{E}} \left[ \sum_{k}Y_{k}^* \mid C=0, X \right] \leq \]

Behavioural Assumptions that exclude characteristics \(X_{1}\) and private information \(V\) from directly affecting the utility function.

Problem

With the assumptions so far, the DM’s choices are always consistent w/ EU max. at some linear utility function and accurate beliefs.

break

We therefore, need to make the following assumption:

  1. Econometric Assumptions to construct informative bounds on missing data

\[ \underline{\mathbb{E}} \left[ \sum_{k}Y_{k}^* \mid C=0, X \right] \leq {\mathbb{E}} \left[ \sum_{k}Y_{k}^* \mid C=0, X \right] \leq \overline{\mathbb{E}} \left[ \sum_{k}Y_{k}^* \mid C=0, X \right] \]

  1. Behavioural Assumptions that exclude characteristics \(X_{1}\) and private information \(V\) from directly affecting the utility function.

We estimate these bounds through

Instrumental Variable Bounds

Pretrial Release Application

Data:

  • 570k cases in NYC (2008-2013)
  • 265 judges
  • Focus on top 25 judges by volume (approx 50% of all cases)
  • Outcome: Failure to appear in court

Observed FTA rates

Question: Are these patterns of pretrial release decisions rationalisable as maximising expected utility at accurate predictions of failure to appear risk given some private information and a utility function that can vary across the four cells?

Bounds

And to do that, I use the quasi-random assignment of judges to cases and apply those instrumental variable bounds that I didn’t have the chance to talk through in more details to construct this curve in blue, which is an upper bound on the failure to appear rate among defendants detained by this particular judge at each defendant cell and each predicted risk decile. Now the identification results simply ask us whether there are any misrankings in this judge’s choices. And it turns out based on the point estimate, the answer is yes. This judge is releasing defendants at the top of the predicted risk distribution at a higher observed failure to appear risk while simultaneously detaining defendants at the bottom of the predicted risk distribution that have a strictly lower worst case failure to appear risk. They could exchange their choices and do strictly better no matter their utility function and private information.

These are point estimates; really, there should be standard errors around them. So formally, we would test

Implementation

For each judge: 1. Calculate release rates by defendant cells 2. Observe failure rates for released defendants 3. Bound failure rates for detained defendants 4. Test for choice misrankings 5. Account for statistical uncertainty

Key Findings

  • At least 20% of judges make systematic mistakes
  • Mistakes concentrated in:
    • Tails of risk distribution
    • Cases with black defendants
  • Results robust to various specifications

Policy Implications

Algorithmic Replacement Analysis:

  • Full automation: Mixed results
  • Targeted automation of high-mistake decisions: Dominates status quo
  • Behavioral analysis helps identify where automation most valuable

Conclusions

  • Framework enables testing for prediction mistakes under weak assumptions
  • Can characterize magnitude and nature of mistakes
  • Results informative for algorithm adoption decisions
  • Extensions to learning, dynamics remain open questions

Thank You

Limitations & QJE Contribution Discussion

Key Limitations

  • Assumes static decision-making framework
    • No modeling of learning or adaptation over time
    • Doesn’t capture dynamic evolution of beliefs
  • Requires specification of exclusion restrictions
    • Results sensitive to assumptions about utility function
  • Cannot fully separate preference heterogeneity from prediction mistakes
    • Some forms of unobserved preference variation could explain results

QJE Contribution Merit

  1. Methodological Innovation
    • First unified econometric framework for analyzing prediction mistakes
    • Bridges behavioral economics and econometric identification
    • Minimal assumptions compared to existing structural approaches
  2. Policy Relevance
    • Timely contribution to algorithm adoption debate
    • Framework applicable across multiple domains
    • Clear policy implications for institutional design
  3. Empirical Rigor
    • Clean identification strategy
    • Robust to multiple specifications
    • Novel use of quasi-experimental variation
  4. Theoretical Foundation
    • Links behavioral economics with decision theory
    • Provides testable implications of rational expectations
    • Bridges theoretical and empirical literatures

Discussion Questions

  • How might this framework extend to settings with learning?
  • What are implications for mechanism design?
  • How does this contribute to behavioral vs. rational expectations debate?